home *** CD-ROM | disk | FTP | other *** search
-
- REPLACE: Search & Replace Pattern Programming Compiler 29th May '92
-
- Replace 1.00: Copyright (C) 1991 Michael Houlder
-
-
- 0. NOTE
- =======
- Please read this file using the system font.
-
-
- 1. Purpose
- ==========
- REPLACE acts as a compiler for a search and replace pattern language based on UNIX regular
- expressions and very similar to that used by Twin and !SrcEdit.
-
- TWIN is the original screen text editor which runs under the command line interpreter.
- !SrcEdit is the new multi-tasking text editor supplied as part of Acorn's new Desktop
- Development Environment (DDE).
-
- The 'wildcarded expression' facility allows structures within complex data files to be
- described, found and modified in a simple data-oriented manner which requires the
- minimum of concern for normal programming considerations.
-
- The ordered execution of many search & replace instructions on the same data can have the
- effect of sophisticated procedural programming.
-
- The compiler takes as its input a language source ASCII text file and, possibly multiple,
- data files.
-
- The use of a permanent disk text file to hold the language source allows accumulation of
- many search & replace instructions for ordered execution on the same data. Hence REPLACE
- provides a new style of programming: pattern programming.
-
- With a permanent disk file, development and accumulation of a program over a period of time is
- possible.
-
- In thes respect, REPLACE differs from both TWIN and !SrcEdit. These take their search and
- replace instructions from the keyboard and no permanent record of the instruction is available
- for future use.
-
- The opportunity in writing REPLACE was taken to formalise fully the syntax, and in particular,
- the semantics of the search & replace language defined by !SrcEdit.
-
- However, !SrcEdit and REPLACE can be used together to considerable advantage. !SrcEdit allows
- the development and testing of fragments of a REPLACE pattern program on an interactive and
- incremental basis.
-
-
- 2. References
- =============
- (a) The DDE format !help files contained in the two applications associated with this
- documentation: !Replace and !Styles.
-
- (b) The 3 parts of my article "Power Search: A Quiet AI Revolution" contained in Archive
- magazine, Vols 5.7, 5.8, 5.9 for April, May and June respectively. These are essential to
- understand the semantics of the pattern programming language.
-
- (c) The Acorn User Guide for TWIN, pages 25 to 32.
-
- (d) The Acorn DDE User Guide for !SrcEdit, pages 99 to 103.
-
- (e) SUN UNIX documentation for LEX, page 123 onwards.
-
- (f) Tony Mason & Doug Brown, "Lex & Yacc", O'Reilly & Associates 1990, page 116 onwards.
- This is a reasonably good, but quite expensive, introduction to the notorious UNIX
- language development tools, LEX (LEXical analyser) & YACC (Yet Another Compiler Compiler).
-
- (g) Kleene S.C., "Representation of events in nerve nets and finite automata", Rand Memorandum,
- Dec 1951. This is the absolutely fundamental academic paper in which Prof. Kleene introduced
- the notion of 'regular expressions'.
-
-
- 3. Restricted Function
- ======================
- REPLACE 1.00 is, hopefully, not the last word in pattern programming. I have version 2 under
- test at the present time which allows recursion equivalent iteration, conditionals,
- functional abstraction, libraries & more. In the development of the language, the aim is to
- minimise procedural algorithmic programming.
-
-
- 4. Definitions
- ==============
- The basic idea is pattern-matching with replacement. Search and Data patterns match or
- correspond. The way they correspond defines a replacement string which is substituted for
- the data pattern.
-
- The search pattern is given by a sequence of terms in the search pattern language. The terms
- and the way they come together are described by the grammar given below.
-
- The search sequence (of terms) corresponds to that entered into the "Find" writeable
- icon in the !SrcEdit "Find Text" dialogue.
-
- The replace sequence is a sequence of terms in the replacement pattern language. This is
- described also by a grammar below. The sequence defines a pattern since the replacement
- string does not have to be constant but can depend on the match actually found between
- the search pattern and the data.
-
- The replace sequence (of terms) corresponds to that entered into the "Replace with"
- writeable icon in the !SrcEdit "Find Text" dialogue.
-
- There one syntax difference between REPLACE and that of !SrcEdit. Sequences in REPLACE
- must be expressible using ASCII text. !SrcEdit uses the symbol ☓ to define a HEX character
- code. This is not an ASCII symbol. So, REPLACE uses '!' as an equivalent: e.g. "!09" for
- "☓09" which stands for a tab character.
-
- Both search and replace sequences in REPLACE must be surrounded by quotes: e.g. "!09".
- A replace sequence may be null: e.g. "".
-
- A search sequence and a replace sequence combine to form a TRANSFORM. A transform is written
- using connectives '=>' and ';':
-
- e.g. "!09" => "";
-
- This transform replaces the tab character with the empty string, i.e. deletes the tab.
-
-
- A pattern program consists of a list of transforms: e.g.
-
- "!09" => ""; /* deletes tabs */
- "$" => ""; /* deletes newlines (linefeeds) */
- " " => ""; /* deletes spaces */
-
- Comments are defined by enclosing the comment within /* . . . */. This follows the C
- programming language convention. Comments may occur anywhere except within either a
- search or a replace sequence.
-
- Should a search sequence need to refer to a comment enclosing symbol, it will need to
- use the 'normal' operator to take the symbol '*' as a natural: e.g.
-
- "/\**.\*/" => ""; /* deletes comments */
-
-
- 5. Operation
- ============
- A multi-tasking interface in accordance with the DDE standard is provided and there are
- two significant modes of operation: compilation, compilation with execution.
-
- Compilation only occurs when a program but no data input is provided. For REPLACE 1.00,
- this gives an opportunity for the compiler to say what it thinks it has been given
- with the program source. It does this via a listing which translates back the compiled
- pseudo code into which the source was translated.
-
- Compilation with execution occurs when both program source and data files are provided.
-
- Each transform in turn is taken from the list and is used from the start of each data
- file to the end to find matches with automatic replacement. This the equivalent to a
- global replacement or the selection of the "End of file Replace" icon in the !SrcEdit
- "Text Found" dialogue.
-
- For documentation on the way the matches are found, or how replacements are made, please
- read the Archive article referred to above: "Power Search: The Quiet AI Revolution".
-
-
- 6. Search Sequence Grammar
- ==========================
- * a search sequence is a number, greater than 0, of search components
- set together in a sequence or list. At least one component of which is
- not a '0 or more' repetition.
-
- * a search component is one of: a specific character, a specific string,
- a set of characters, or a repetition.
-
- * a specific character is any character that is not a search pattern
- control symbol unless it has been normalised by the normal
- operator.
-
- * a specific string is any string composed of specific characters put
- together.
-
- * a set of characters is either a pre-defined set or a user-defined set
-
- * a repetition is a repetition control symbol followed by either a
- specific character or a set of characters.
-
- * a user-defined set is either a bracketed set or a negated set.
-
- * a search pattern control symbol is one of: a pre-defined set symbol, a
- set construction symbol, a repetition control symbol, or a character
- operator.
-
- * a pre-defined set is one of the four symbols: '.', '$', '@', '#'.
- Respectively, these are named for interpretation purposes as: 'any',
- 'newline', 'alphanum', 'digit'.
-
- * a set construction symbol is one of the four symbols: '[', ']', '~', '-'.
- Respectively, these are named as: 'left set', 'right set', 'not', and
- 'to'.
-
- * a repetition control symbol is one of the three symbols: '*', '^', '%'.
- Respectively, these are named as: '0 or more', '1 or more', or
- 'most'.
-
- * a character operator is one of the three symbols: '\', '|', '!'.
- Respectively, these are named as: 'normal', 'control', or 'hex'
- operators. They are not part of the search sequence themselves.
- They change the status of the character or characters that follow
- them, either by normalising search pattern control characters or by
- re-coding hexadecimal or non-printable ASCII values. !SrcEdit uses
- a non-printable graphics symbol '☓' instead of '!'.
-
- * a bracketed set is a combination of character lists and character
- ranges put together in any order between a 'left set' bracket and a
- 'right set' bracket
-
- * a character list is a list composed of characters that are not set
- construction symbols unless they are normalised or out of context.
- For instance, in "[abc[d]", the second '[' cannot be 'left set' and is,
- therefore, out of context. Again in "[-xyz]", '-' cannot be the range
- symbol 'to'.
-
- * a character range is a list of three characters with the middle one
- being the 'to' symbol and the outer two being characters that are not
- set construction symbols unless they are normalised or out of
- context. The outer two are not required to be in any order; i.e. "[a-t]"
- is equivalent to "[t-a]".
-
- * a negated set is the set construction symbol 'not' followed by either
- a specific character, a pre-defined set symbol, or a bracketed set.
- Such a component may not be null; "~." is excluded.
-
- * For no good reason, !SrcEdit excludes the repetition "%.". As this
- component is important, standing for "the rest of the file", Replace
- supports it.
-
-
- 7. Replace Sequence Grammar
- ===========================
- * a replace sequence is a number, greater or equal to 0, of replace components set together in a
- sequence or list.
-
- * a replace component is one of: a specific character, a specific string, a newline, a numbered field,
- or a found string.
-
- * a specific character is any character excluding three: '$', '?', '&', unless they have been
- normalised by the normal operator.
-
- * a specific string is any string composed of specific characters put together.
-
- * a newline is the replace control symbol '$'.
-
- * a numbered field is the replace control symbol '?' followed by a single decimal digit, '0' to '9'.
-
- * a found string is the replace control symbol '&'.
-
-
- 8. Feedback
- ===========
- Feedback and suggestions coming from evaluation and use of the compiler would be most welcome.
- Please contact:
-
- Mike Houlder,
- 6 Worrall Road,
- Sheffield,
- South Yorkshire S6 4BA
-
-